############################ # PAS-seq Pipeline using Bowtie1.0 ############################ 0. Preprocess # To remove reads that don't include polyA as well as the barcodes from each read. Use "preprocess.py" in myPythonTool folder. 1. Alignment module load bowtie/1.0.0 bowtie [mm9_indexed_refGenome] -q [input.fastq] -n 2 -m 1 -S > [output.sam] 2> [output.log] 1.5 Remove internal priming reads 2. Remove duplicates # Convert SAM file to BED format so that duplicates can be removed using "PCRdup_BEDcombiner.py" but this step is later discarded in "tophat_process". # To apply "PAS_incrementer.py", the input format is also BED. convert2bed --input=sam [--output=bed] < [input.sam] > [output.bed] 3. Prepare BAMs # convert BED file to BAM format so "bamCoverage" can be applied to generate bigwigs but later I switched to "bedGraphtoBigwig". module load bedtools/2.23.0 bedtools bedtobam -i [input.bed] -g [genome_reference] > [output.bam] module load samtools/1.1 samtools sort -T [temp.file] -o [output.sorted.bam] [input.bam] 4. Generate tracks # generated tracks on forward strand, reverse strand, and combined. # bamCoverage is a function in deepTools bamCoverage -b [input.sorted.bam] --normalizeUsingRPKM --samFlagExclude 16 --ignoreDuplicates --binSize=2 -p 32 -o [output.f.bw] bamCoverage -b [input.sorted.bam] --normalizeUsingRPKM --samFlagInclude 16 --ignoreDuplicates --binSize=2 -p 32 -o [output.r.bw] bamCoverage -b [input.sorted.bam] --normalizeUsingRPKM --ignoreDuplicates --binSize=2 -p 32 -o [output.c.bw] 5. Generate read counts table # using masterlist — Tian PAS data.txt, as reference, available at: https://cbcl.ics.uci.edu/public_data/shilab/PASseq_szang_20160302/Tian%20PAS%20data.txt